Citation-based plagiarism detection - idea, implementation and evalutation

نویسنده

  • Bela Gipp
چکیده

Currently used Plagiarism Detection Systems solely rely on textbased comparisons. They only deliver satisfying results if the plagiarized text is copied literally (copy&paste), with minor alterations (e.g. shake&paste) or is machine translated. However, if the text is paraphrased or translated by a human, the currently used methods yield a very poor performance. Using the words of Weber Wulff, who organizes regular comparisons for Plagiarism Detection Systems (PDS), the current state of available systems can be summarized as follows: “[...] PDS find copies, not plagiarism.”. In contrast to the existing approaches for Plagiarism Detection, Citation-based Plagiarism Detection compares the occurrences of citations in order to identify similarities. The most basic form is to measure the bibliographic coupling strength (citation overlap). However, this alone would lead to numerous false-positives, thus it is advisable to include further factors such as the order of citations, their proximity to each other, their chance of cooccurrence, and other more sophisticated measures. If e.g. four papers are cited in a similar order in two documents, this can be interpreted as a subtle hint that both works may not have been created independently of one another. If none of these four papers have been co-cited before in another paper or the order of citations is identical, this might indicate plagiarism. The advantages and limitations of Citation-based Plagiarism Detection are very different from those of the currently used textbased methods. Text matching approaches continue to be suitable for detecting copy&paste plagiarism, even for short passages. They are also advantageous in that they do not require citation information; yet, they fail to identify e.g. paraphrased, translated and idea plagiarism. By applying the citation-based approach to the doctoral thesis of Guttenberg, which is a well-examined, real world plagiarism example tested by numerous conventional Plagiarism Detection Systems, we could show that the citationbased approach is able to identify 13 out of the 16 translated plagiarisms. Conventional methods failed to identify any of these sections. However, as expected, short passages of copy&paste plagiarism can usually only be identified by text-based approaches. Therefore, Citation-based Plagiarism Detection is by no means a replacement for the currently used text-based approaches, but should be considered as a complement for identifying currently hard to find well-disguised plagiarisms. Additionally, once signs of plagiarism have been found, neither the text-based approaches, nor the citation-based approaches eliminate the need for manual examination.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Identifying Related Work and Plagiarism by Citation Analysis

This updated and revised paper gives an overview of my PhD research. It focuses on two newly developed approaches. Citation Proximity Analysis (CPA) allows the identification of related work by analyzing the co-occurrence of citations within documents. In contrast to co-citation analysis various factors, such as the proximity of citations to each other, are taken into account. The second approa...

متن کامل

Citation-Based Plagiarism Detection: Practicability on a Large-Scale Scientific Corpus

The automated detection of plagiarism is an information retrieval task of increasing importance as the volume of readily accessible information on the web expands. A major shortcoming of current automated plagiarism detection approaches is their dependence on high character-based similarity. As a result, heavily disguised plagiarism forms, such as paraphrases, translated plagiarism, or structur...

متن کامل

CitePlag: A Citation-based Plagiarism Detection System Prototype

This paper presents an open-source prototype of a citation-based plagiarism detection system called CitePlag. The underlying idea of the system is to evaluate the citations of academic documents as language independent markers to detect plagiarism. CitePlag uses three different detection algorithms that analyze the citation sequence of academic documents for similar patterns that may indicate u...

متن کامل

State-of-the-art in detecting academic plagiarism

The problem of academic plagiarism has been present for centuries. Yet, the widespread dissemination of information technology, including the internet, made plagiarising much easier. Consequently, methods and systems aiding in the detection of plagiarism have attracted much research within the last two decades. Researchers proposed a variety of solutions, which we will review comprehensively in...

متن کامل

EMAS Framework For Text Plagarism Detection ( Evolutionary Multi - Agent System )

Research ultimate goal remains to Enhance Science and Technology. Scientists, Research scholars and teacher are dedicated to research. But It has been Observed that in other to achieve success research methodology is been plagiarized. Investigating and Identifying Genuine Research innovation is demand of Todays research domain. Idea Innovation and Invention are vital for today’s research domain...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • TCDL Bulletin

دوره 8  شماره 

صفحات  -

تاریخ انتشار 2012